Syllable-Based Multi-POSMORPH Annotation for Korean Morphological Analysis and Part-of-Speech Tagging
نویسندگان
چکیده
Various research approaches have attempted to solve the length difference problem between surface form and base of words in Korean morphological analysis part-of-speech (POS) tagging task. The compound POS method is a popular approach, which tackles using annotation tags. However, dictionary required for post-processing recover dissolve ambiguity tags, degrades system performance. In this study, we propose novel syllable-based multi-POSMORPH within one framework, without post-processing. A tag created by combining tags morpheme syllables simultaneous recovery. model implemented with two-layer transformer encoder, lighter than existing models based on large language models. Nonetheless, experiments demonstrate that performance proposed comparable to, or better than, previous
منابع مشابه
Syllable-Pattern-Based Unknown-Morpheme Segmentation and Estimation for Hybrid Part-of-Speech Tagging of Korean
Most errors in Korean morphological analysis and part-of-speech (POS) tagging are caused by unknown morphemes. This paper presents a syllable-pattern-based generalized unknownmorpheme-estimation method with POSTAG (POStech TAGger), which is a statistical and rule-based hybrid POS tagging system. This method of guessing unknown morphemes is based on a combination of a morpheme pattern dictionary...
متن کاملMachine Aided Error-Correction Environment for Korean Morphological Analysis and Part-of-Speech Tagging
Statistical methods require very large corpus with high quality. But building large and faultless annotated corpus is a very difficult job. This paper proposes an efficient method to construct part-of-speech tagged corpus. A rulebased error correction method is proposed to find and correct errors semi-automatically by user-defined rules. We also make use of user's correction log to reflect feed...
متن کاملPart-of-Speech-Tagging using morphological information
This paper presents the results of an experiment to decide the question of authenticity of the supposedly spurious Rhesus - a attic tragedy sometimes credited to Euripides. The experiment involves use of statistics in order to test whether significant deviations in the distribution of word categories between Rhesus and the other works of Euripides can or cannot be found. To count frequencies of...
متن کاملPart-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.
متن کاملSyllable-based probabilistic morphological analysis model of Korean
In this paper, we present a syllable-based probabilistic morphological analysis model of Korean. While the previous morphological analyzers that regardmorpheme as a processing unit, the model exploits syllable as a processing unit in order to endure the unknown word problem. Actually, it does not use any morpheme dictionary. In contract to the previous systems that depend on manually constructe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2023
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app13052892